40 research outputs found

    A penalized likelihood estimation approach to semiparametric sample selection binary response modeling

    Get PDF
    Sample selection models are employed when an outcome of interest is observed for a restricted non-randomly selected sample of the population. We consider the case in which the response is binary and continuous covariates have a nonlinear relationship to the outcome. We introduce two statistical methods for the estimation of two binary regression models involving semiparametric predictors in the presence of non-random sample selection. This is achieved using a multiple-stage procedure, and a newly developed simultaneous equation estimation scheme. Both approaches are based on the penalized likelihood estimation framework. The problems of identification and inference are also discussed. The empirical properties of the proposed approaches are studied through a simulation study. The methods are then illustrated using data from the American National Election Study where the aim is to quantify public support for school integration. If non-random sample selection is neglected then the predicted probability of giving, for instance, a supportive response may be biased, an issue that can be tackled using the proposed tools

    A semiparametric bivariate probit model for joint modeling of outcomes in STEMI patients

    Get PDF
    In this work we analyse the relationship among in-hospital mortality and a treatment effectiveness outcome in patients affected by ST-Elevation myocardial infarction. The main idea is to carry out a joint modeling of the two outcomes applying a Semiparametric Bivariate Probit Model to data arising from a clinical registry called STEMI Archive. A realistic quantification of the relationship between outcomes can be problematic for several reasons. First, latent factors associated with hospitals organization can affect the treatment efficacy and/or interact with patient’s condition at admission time. Moreover, they can also directly influence the mortality outcome. Such factors can be hardly measurable. Thus, the use of classical estimation methods will clearly result in inconsistent or biased parameter estimates. Secondly, covariate-outcomes relationships can exhibit nonlinear patterns. Provided that proper statistical methods for model fitting in such framework are available, it is possible to employ a simultaneous estimation approach to account for unobservable confounders. Such a framework can also provide flexible covariate structures and model the whole conditional distribution of the response

    A Bayesian Approach to Phylogenetic Networks

    Get PDF
    Traditional phylogenetic inference assumes that the history of a set of taxa can be explained by a tree. This assumption is often violated as some biological entities can exchange genetic material giving rise to non-treelike events often called reticulations. Failure to consider these events might result in incorrectly inferred phylogenies, and further consequences, for example stagnant and less targeted drug development. Phylogenetic networks provide a flexible tool which allow us to model the evolutionary history of a set of organisms in the presence of reticulation events. In recent years, a number of methods addressing phylogenetic network reconstruction and evaluation have been introduced. One of suchmethods has been proposed byMoret et al. (2004). They defined a phylogenetic network as a directed acyclic graph obtained by positing a set of edges between pairs of the branches of an underlying tree to model reticulation events. Recently, two works by Jin et al. (2006), and Snir and Tuller (2009), respectively, using this definition of phylogenetic network, have appeared. Both works demonstrate the potential of using maximum likelihood estimation for phylogenetic network reconstruction. We propose a Bayesian approach to the estimation of phylogenetic network parameters. We allow for different phylogenies to be inferred at different parts of our DNA alignment in the presence of reticulation events, at the species level, by using the idea that a phylogenetic network can be naturally decomposed into trees. A Markov chainMonte Carlo algorithmis provided for posterior computation of the phylogenetic network parameters. Also a more general algorithm is proposed which allows the data to dictate how many phylogenies are required to explain the data. This can be achieved by using stochastic search variable selection. Both algorithms are tested on simulated data and also demonstrated on the ribosomal protein gene rps11 data from five flowering plants. The proposed approach can be applied to a wide variety of problems which aim at exploring the possibility of reticulation events in the history of a set of taxa.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Testing exogeneity in the bivariate probit model: Monte Carlo evidence and an application to health economics

    Get PDF
    Many economic applications involve the modeling of a binary variable as simultaneously determined with one of its dycotomous regressors. In this paper we deal with a prominent health economics case study, that of cesarean section delivery utilization across public and private hospitals. Estimating the probability of cesarean section in a univariate framework neglecting the potential endogeneity of the hospital type dummy might lead to invalid inference. Since little is known about the exact sampling properties of alternative statistics for testing exogeneity of a dycotomous regressor in probit models, we conduct an extensive Monte Carlo experiment. Equipped with the simulation results we apply a comprehensive battery of tests to an Italian sample of women and find clear evidence against exogeneity of the hospital type dummy. We speculate on the economic implications of these results and discuss the misleading interpretation arising from the adoption of either univariate probit model or seemingly unrelated bivariate probit model

    A spline-based framework for the flexible modelling of continuously observed multistate survival processes

    Get PDF
    Multistate modelling is becoming increasingly popular due to the availability of richer longitudinal health data. When the times at which the events characterising disease progression are known, the modelling of the multistate process is greatly simplified as it can be broken down in a number of traditional survival models. We propose to flexibly model them through the existing general link-based additive framework implemented in the R package GJRM. The associated transition probabilities can then be obtained through a simulation-based approach implemented in the R package mstate, which is appealing due to its generality. The integration between the two is seamless and efficient since we model a transformation of the survival function, rather than the hazard function, as is commonly found. This is achieved through the use of shape constrained P-splines which elegantly embed the monotonicity required for the survival functions within the construction of the survival functions themselves. The proposed framework allows for the inclusion of virtually any type of covariate effects, including time-dependent ones, while imposing no restriction on the multistate process assumed. We exemplify the usage of this framework through a case study on breast cancer patients

    Copula based generalized additive models for location, scale and shape with non-random sample selection

    Get PDF
    Non-random sample selection is a commonplace amongst many e mpirical studies and it appears when an output variable of interest is available only for a restricted non- random sub-sample of data. An extension of the generalized additive models for location, scale and shape which account s for non-random sample selection by introducing a selection equation is discussed. The proposed approach all ows for potentially any parametric distribution for the outcome variable, any parametric link function for the sele ction equation, several dependence structures between the (outcome and selection) equations through the use of copula e, and various types of covariate effects. Using a special case of the proposed model, it is shown how the score equation s are corrected for the bias deriving from non-random sample selection. Parameter estimation is carried out with in a penalized likelihood based framework. The empirical effectiveness of the approach is demonstrated through a sim ulation study and a case study. The models can be easily employed via the gjrm() function in the R package GJRM

    Evaluating treatment effectiveness under model misspecification : a comparison of targeted maximum likelihood estimation with bias-corrected matching

    Get PDF
    Statistical approaches for estimating treatment effectiveness commonly model the endpoint, or the propensity score, using parametric regressions such as generalised linear models. Misspecification of these models can lead to biased parameter estimates. We compare two approaches that combine the propensity score and the endpoint regression, and can make weaker modelling assumptions, by using machine learning approaches to estimate the regression function and the propensity score. Targeted maximum likelihood estimation is a double-robust method designed to reduce bias in the estimate of the parameter of interest. Bias-corrected matching reduces bias due to covariate imbalance between matched pairs by using regression predictions. We illustrate the methods in an evaluation of different types of hip prosthesis on the health-related quality of life of patients with osteoarthritis. We undertake a simulation study, grounded in the case study, to compare the relative bias, efficiency and confidence interval coverage of the methods. We consider data generating processes with non-linear functional form relationships, normal and non-normal endpoints. We find that across the circumstances considered, bias-corrected matching generally reported less bias, but higher variance than targeted maximum likelihood estimation. When either targeted maximum likelihood estimation or bias-corrected matching incorporated machine learning, bias was much reduced, compared to using misspecified parametric models
    corecore